Ziva Vatra - home :: Odds and Sods :: Git Hints

Git hints, tips and troubleshooting

Here I document all the things I discovered about git and its behaviour while using it over the years. It includes tips, hints, troubleshooting and automation scripts I have written to make my life easier and felt like sharing. I felt this page was needed to remind myself of things I did before so I don't have to re-invent the wheel and to assist others who may have the same (or similar) goals.

Auto-modifying committed files with git hooks

For some context, this website runs Blosxom and each of these articles is a html file. These are stored in a git repo before being published live. At the bottom of each you will notice a "Page Created" and "Page last modified" footer. This is auto-created by a Perl script which is currently run manually on each file I want to change.

I wanted to automate this in a git hook, so that every time I did a git commit it would auto-update the "last modified" timestamp for each modified file. First thing I did was search online and surprisingly I didn't find anyone who had managed. The web was full of people who tried to put this in pre-commit hooks and had problems. From getting errors like "Error: Cannot Lock Ref" to having the auto modified files not part of the commit (forcing a manual commit after the auto changes) the general web consensus was that "it could not be done".

I gave it a go myself, and yes I found you could not do it with a pre-commit hook. The problem is that git uses lockfiles to make sure only one git process is running at one time on a repo. When you do a "git commit" the process launches the git pre-commit hook and the process will wait until the hook finishes. If during this time your script tries to change the files and commit the changes, it will fail with the "Error: Cannot Lock Ref" because the newly launched git commit process is locked from altering the repo while the original commit process is still running. If you don't try to commit your changes the git commit will complete without error, but your auto-changes would remain in your repo rather than in the last commit.

My solution to this was to make use of a post-commit hook instead. Unfortunately this results in doing two commits instead of one, but it works for modifying the files in the way I wanted. In order to do this, you need two things from the script:

  1. You need to know the message from the last commit. As our post-commit hook needs itself to do a commit, we need some way of checking if we ran last time, otherwise you could end up in an infinite commit loop. We do this by checking the last message and seeing if a specific string is in there.
  2. You need to know what files were modified in the last commit, in order to run the specific updates only to them.

The core part of the Perl script is shown below. I've omitted the subroutines that do the actual changes as that is specific to my use case, but it does show the logic involved in making automatic changes to the last modified files. At this point in time it only makes changes to article files, but in theory nothing stops you having subroutines to handle other use cases.


# First thing we check is if the last commit message was done by this hook.
# If so we do nothing more (to prevent infinite loops)
$lastlog = `git log -1`;
exit(0) if ($lastlog =~ m/post-commit-hook:/g);

# The command below tells us the file name and what was done.
# If a file was modified it starts with "M", if the file was renamed
# we get "R", etc..

foreach(`git diff --name-status --cached HEAD^`) {
	# First check if the file was modified, we only want to update modified
	# files, not files that have been renamed/moved/other.
	next unless /M\W+/;

	# If we are dealing with modifications, we extract the filename
	($_) = $_ =~ /M\W+(.*)/;

	chomp;
	# update CMS article files.
	updateTimestamp("$_") if m~/article$~;
}

# As we made changes, we have to commit again, with auto-message for reference
system("git commit -m 'post-commit-hook: Updating CMS timestamps' ./");
exit(0);

With this system you do end up getting two commits for every single human commit, half of which are identical auto-commits, but the system works as shown on the pages of this website :-)

Tracking git hooks within the repo

If you want a hook to be part of your repo (so that changes can be tracked and hooks kept in sync across repos) on modern git versions you can create your own custom path, "git add" it, then set your local git config to point to it for the hooks as shown:

git config --local core.hooksPath  ./.githooks/

This config change does not persist across cloned repos, so every time a new clone is created the config needs to be set for the hooks to work.

Multi-origin git

Git was originally designed to be a more flexible and decentralised version of the previous source control system used by the kernel devs, namely SVN (itself an improvement to the older CVS). Both CVS and SVN are centralised, with a single "source control" server the clients commit and update from. When git came along a lot of people who moved across just mapped their old centralised way of thinking to git without even considering the possibilities offered by a decentralised source control system.

The result of this is that we have "git servers" now, such as gitlab, gitea, github, etc... that clients push and pull from. Usually this git server is called "origin" however there is nothing in git that prevents you from having multiple remote targets, as long as they are differently named.

In my case the need arose when I was travelling abroad without internet access. I wanted to make sure my local changes would not be lost in case the laptop got stolen or damaged, but I could not push to my git server. I always carried a backup USB drive for important documents, so I could have made a kludge with git hooks and a script to rsync my repo across.

Instead I decided to make use of git's decentralised design to create a more elegant solution. The idea is to define a second remote called "backup", this way I could push to either "origin" (my git server and the default) or my backup, depending on which is available, using git itself to keep everything in sync.

To do this I had to create and configure a bare git repo on my backup USB disk:

git init --bare $fullBackupPath

The next step is to configure all the existing repos to have the backup entry. This is done as follows:

git config --worktree --add remote.backup.url 'file:///$fullBackupPath' 
git config --worktree --add remote.backup.fetch +refs/heads/*:refs/remotes/backup/*

Now that we have two remote entries, we can choose which to pull/push from directly like in the following example:

git push origin # push current branch to the git server
git push backup # push current branch to the backup disk
git pull origin # pull current branch from the git server
git pull backup # pull current branch from the backup disk

The first time you push to backup will sync the entire branch to the repo. Depending on the size of it this may take a while.

Unfortunately git does not provide a way to push/pull from multiple remotes simultaneously so if you want that ability you will have to script it. For my workflow I wrote a Perl git wrapper that would check if my backup disk was mounted on my backup path and whether the git server was reachable. Depending on the result it would direct my command to one or both of the remote targets. That way I just call a single command and it does the magic behind the scenes to make sure everything is kept in sync

If you don't want to have to explicitly specify the remote target each time you do a push/pull, you need to set a default remote. You can do this with this command:

git push --set-upstream origin $yourBranchName

This sets your default branch as "origin" to push/pull from, which is the default and generally mainstream accepted behaviour of git. This way you have to explicitly specify if you want to push the "backup" rather than "origin" remote path. You can of course set "backup" or any other remote path as your default upstream, but this is what works for me.

Page created: Sat Dec 30 18:34:00 2023 ][ Page last modified: Thu Oct 24 20:47:56 2024 ]